Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM Integration Tests #603

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Dec 24, 2024

🔍 Review Summary

Purpose:

  • Enhance the testing framework by integrating tests for multiple LLM providers.

Changes:

  • Configuration: Introduced environment variables for API keys in GitHub workflow and initialized various LLM providers.
  • Enhancement: Improved async handling and error management for AI21, Groq, Litellm, and Mistral providers.
  • Test: Expanded testing to include comprehensive integration tests for all LLM providers, covering both synchronous and asynchronous call patterns.
  • Dependencies: Updated tox.ini to include necessary test dependencies for new providers.

Impact:

  • Significantly enhances the reliability and coverage of our testing infrastructure, improving code quality and system integrity.
Original Description

Adds integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, AI21

This PR adds comprehensive integration tests for multiple LLM providers:

  • Anthropic (Claude)
  • Cohere
  • Groq
  • Litellm
  • Mistral
  • AI21

Each test verifies four types of calls:

  1. Synchronous (non-streaming)
  2. Synchronous (streaming)
  3. Asynchronous (non-streaming)
  4. Asynchronous (streaming)

The PR also:

  • Adds necessary test dependencies to tox.ini
  • Updates GitHub workflow with required environment variables
  • Adds debug prints for API key and session verification
  • Enables LLM call instrumentation in tests

Link to Devin run: https://app.devin.ai/sessions/e034afaf9cfb45529f3b652de116cf0e

- Add integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, AI21
- Add test dependencies to tox.ini
- Update GitHub workflow with required environment variables
- Add debug prints for API key and session verification
- Enable LLM call instrumentation in tests

Co-Authored-By: Alex Reibman <[email protected]>
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
  • Look at CI failures and help fix them

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

Walkthrough

This update enhances the testing framework by integrating tests for multiple LLM providers, including Anthropic, Cohere, Groq, Litellm, Mistral, and AI21. Key changes include:

  • Environment Setup: Added API key environment variables in the GitHub workflow.
  • Provider Configuration: Initialized and configured new LLM providers in agentops/__init__.py.
  • Provider Enhancements: Refactored AI21, Groq, Litellm, and Mistral providers for better async handling and error management.
  • Testing: Comprehensive integration tests were added for all new providers, ensuring coverage of synchronous and asynchronous call patterns. OpenAI tests were updated to use the gpt-3.5-turbo model.
  • Dependencies: Updated tox.ini to include necessary test dependencies for the new providers.

These changes ensure robust testing and validation of LLM interactions across various providers.

Changes

File(s) Summary
.github/workflows/python-testing.yml Added environment variables for API keys of new LLM providers.
agentops/__init__.py Initialized new LLM providers and configured them for testing.
agentops/llms/providers/ Refactored and enhanced LLM providers (AI21, Groq, Litellm, Mistral) with improved async handling, error management, and response handling.
tests/ Added comprehensive integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, and AI21 providers covering all call patterns. Updated OpenAI tests to use gpt-3.5-turbo.
tox.ini Included test dependencies for new LLM providers.

🔗 Related PRs

  • GH Actions: fix the pipeline #564: The PR updates GitHub Actions workflows to utilize the absolute uv managed runtime, improves the static analysis workflow, and fixes Python test actions to ensure proper environment management.
  • version bump #546: This pull request outlines changes made and includes sections for a description and testing validation.
  • deps: remove packaging; unpinned & ranged versioning #561: The pull request addresses several issues, clarifies that packaging is an implicit dependency of setuptools, restores a loose dependency on psutil, and caps all dependencies at their latest stable versions for security and performance reasons.
  • fix tests #562: The pull request addresses issues with newer PsUtil versions failing tests by removing unused arguments in tuples.
Instructions

Emoji Descriptions:

  • ⚠️ Potential Issue - May require further investigation.
  • 🔒 Security Vulnerability - Fix to ensure system safety.
  • 💻 Code Improvement - Suggestions to enhance code quality.
  • 🔨 Refactor Suggestion - Recommendations for restructuring code.
  • ℹ️ Others - General comments and information.

Interact with the Bot:

  • Send a message or request using the format:
    @bot + *your message*
Example: @bot Can you suggest improvements for this code?
  • Help the Bot learn by providing feedback on its responses.
    @bot + *feedback*
Example: @bot Do not comment on `save_auth` function !

Execute a command using the format:

@bot + */command*

Example: @bot /updateCommit

Available Commands:

  • /updateCommit ✨: Apply the suggested changes and commit them (or Click on the Github Action button to apply the changes !)
  • /updateGuideline 🛠️: Modify an existing guideline.
  • /addGuideline ➕: Introduce a new guideline.

Tips for Using @bot Effectively:

  • Specific Queries: For the best results, be specific with your requests.
    🔍 Example: @bot summarize the changes in this PR.
  • Focused Discussions: Tag @bot directly on specific code lines or files for detailed feedback.
    📑 Example: @bot review this line of code.
  • Managing Reviews: Use review comments for targeted discussions on code snippets, and PR comments for broader queries about the entire PR.
    💬 Example: @bot comment on the entire PR.

Need More Help?

📚 Visit our documentation for detailed guides on using Entelligence.AI.
🌐 Join our community to connect with others, request features, and share feedback.
🔔 Follow us for updates on new features and improvements.

Comment on lines 33 to 42
def sync_stream():
litellm.api_key = os.getenv("ANTHROPIC_API_KEY")
stream_result = litellm.completion(
model="anthropic/claude-3-opus-20240229",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
stream=True,
)
for chunk in stream_result:
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Bug Fix:

Handle Stream Content in sync_stream
Ensure sync_stream processes or stores stream content to avoid logical errors.

🔧 Suggested Code Diff:
for chunk in stream_result:
    if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:
        # Process or store the content here
        print(chunk.choices[0].delta.content)
📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def sync_stream():
litellm.api_key = os.getenv("ANTHROPIC_API_KEY")
stream_result = litellm.completion(
model="anthropic/claude-3-opus-20240229",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
stream=True,
)
for chunk in stream_result:
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:
pass
import os
import litellm
def sync_stream():
litellm.api_key = os.getenv("ANTHROPIC_API_KEY")
stream_result = litellm.completion(
model="anthropic/claude-3-opus-20240229",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
stream=True,
)
for chunk in stream_result:
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:
# Process or store the content here
print(chunk.choices[0].delta.content)

Comment on lines 32 to 36
def sync_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream_result = client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync streaming"}],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Model Change in OpenAI API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call is significant and could impact the application's behavior and performance. It's crucial to verify that 'gpt-3.5-turbo' meets the requirements previously fulfilled by 'gpt-4o-mini'. If this change is intentional, ensure that all related documentation and test cases are updated accordingly. If not, consider reverting to 'gpt-4o-mini' or selecting a more suitable model.

🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def sync_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream_result = client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream_result = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
stream=True
)

Comment on lines 42 to 46
async def async_no_stream():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
await client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async no stream"}],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Model Change Verification in OpenAI Integration Test
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the API call could impact the test's functionality and expected outcomes. It is crucial to verify that 'gpt-3.5-turbo' is the intended model for this test. If the change was unintentional, revert to 'gpt-4o-mini'. Ensure that the test requirements align with the capabilities of the new model to avoid unexpected results.

🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
async def async_no_stream():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
await client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async no stream"}],
import os
from openai import AsyncOpenAI
async def test_openai_integration():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
await client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async no stream"}],
)

Comment on lines 48 to 56

async def async_stream():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
async_stream_result = await client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async streaming"}],
stream=True,
)
async for _ in async_stream_result:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Model Change in API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the API call could lead to different outputs or performance issues. It's crucial to verify that 'gpt-3.5-turbo' meets the requirements previously fulfilled by 'gpt-4o-mini'. If this change is intentional, ensure that all related documentation and tests are updated to reflect this modification. This will help maintain consistency and avoid potential confusion or errors in the future.

🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
async def async_stream():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
async_stream_result = await client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async streaming"}],
stream=True,
)
async for _ in async_stream_result:
import os
from openai import AsyncOpenAI
async def test_async_openai_integration():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
async_stream_result = await client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async streaming"}],
stream=True,
)
async for _ in async_stream_result:
pass

Comment on lines 25 to 29
def sync_no_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync no stream"}],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Verify Model Change in OpenAI API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call could impact the test's behavior and results. It is crucial to confirm that this modification is intentional and aligns with the test's objectives. If the change is deliberate, ensure that the test expectations are updated to accommodate any differences in model behavior or output. If not, revert to the original model to maintain test integrity.

🔧 Suggested Code Diff:
 def sync_no_stream():
-    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
-    client.chat.completions.create(
-        model="gpt-4o-mini",
+    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+    client.chat.completions.create(
+        model="gpt-3.5-turbo",
         messages=[{"role": "user", "content": "Hello from sync no stream"}],
     )
📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def sync_no_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync no stream"}],
def sync_no_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync no stream"}],
)
📜 Guidelines

Markdown:
• Use fenced code blocks and specify language when applicable
Python:
• Use f-strings or format methods for string formatting


Comment on lines 31 to 36

def sync_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream_result = client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync streaming"}],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Review Change in OpenAI Model Version
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call is significant and should be carefully reviewed. This modification can impact the test's behavior and results, as different models may have varying capabilities and performance characteristics. Ensure that 'gpt-3.5-turbo' aligns with the test's objectives and does not introduce regressions. Additionally, update any related documentation or test expectations to reflect this change. Verify that the new model meets the requirements of the integration test, especially in terms of output consistency and performance.

🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def sync_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream_result = client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
def sync_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream_result = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync streaming"}],
stream=True
)

Comment on lines 48 to 56

async def async_stream():
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
async_stream_result = await client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from async streaming"}],
stream=True,
)
async for _ in async_stream_result:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Model Change in async_stream Function
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the async_stream function could impact the function's behavior and output. It's crucial to ensure that 'gpt-3.5-turbo' meets the same requirements and expectations as 'gpt-4o-mini'. This change might introduce unexpected behavior or performance differences.

Actionable Steps:

  • Review the requirements and expected outputs for the async_stream function.
  • Conduct thorough testing to verify that 'gpt-3.5-turbo' produces the desired results.
  • Ensure no regressions are introduced with this model change.

This will help maintain the integrity and performance of the integration test.


Comment on lines 25 to 29
def sync_no_stream():
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
client.chat.completions.create(
model="gpt-4o-mini",
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync no stream"}],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Model Change Verification Required
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call could impact the application's behavior and output. It is crucial to verify if this change aligns with the application's requirements and expected outcomes. If the change is intentional, ensure that all related documentation and tests are updated to reflect this modification. If not, consider reverting to the original model or selecting a more suitable alternative.


devin-ai-integration bot and others added 4 commits December 24, 2024 07:39
…sync, create_stream, create_stream_async)

Co-Authored-By: Alex Reibman <[email protected]>
- Remove try-except blocks to improve debugging
- Add blank lines after imports for consistent formatting
- Keep error handling minimal and explicit

Devin Run: https://app.devin.ai/sessions/e034afaf9cfb45529f3b652de116cf0e

Co-Authored-By: Alex Reibman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants